Search CORE

7 research outputs found

Versification and Authorship Attribution

Author: Plecháč Petr
Šeļa Artjoms
Publication venue: 'Charles University in Prague, Karolinum Press'
Publication date: 01/01/2021
Field of study

The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato

CU Digital Repository

Directory of Open Access Books (DOAB)

1800-luvun alun ”venäläinen laulu” korpustutkimuksen valossa

Author: Pylsy Mika
Toivola Riku
Šeļa Artjoms
Publication venue: Venäjän ja Idäntutkimuksen seura (VIETS)
Publication date: 24/01/2019
Field of study

In this article, ‘Russian songs’ from the beginning of the 19th century – i.e. imitations or ‘stylisations’ of non-ritual lyric Russian folksongs – are analysed using the methods of big data research. A corpus of ‘Russian songs’ is compared to corpora consisting of both folk songs and literary texts. The poetics of ‘Russian songs’, surprisingly enough, do not resemble the folk songs they are supposed to be imitating, and comes more close to the literary norms of their time.Artikkeli käsittelee ”venäläisiä lauluja”, toisin sanoen rituaaleista irrallisen, lyyrisen venäläisen kansanlaulun pastisseja. Nämä laulelmat tai romanssit muistuttavat muodoltaan kansanlauluja, mutta ovat useimmiten yksittäisenrunoilijan käsialaa. Pastissin ja jäljittelyn kohteen välistä suhdetta on huomattavasti vaikeampi kuvata teoreettisesti kuin kansanrunoudesta lainatun aineksen käyttöä kaunokirjallisuudessa yleensä. Vastakkaisen suuntauksen tutkimus, eli tutkimus, jossa tarkastellaan kaunokirjallisten teosten adaptoitumista kansanrunoudeksi, on yleisellä tasolla auttanut ymmärtämään sanallisen kansanperinteen mekanismeja. Tässä artikkelissa lähestytään kuitenkin venäläistä kirjallisuushistoriaa ja sen tyylivariaatioita korpusanalyysin keinoin. ”Venäläisistä lauluista” koottua tekstikorpusta verrataan sekä kansanrunouden että kaunokirjallisuuden teksteistä koottuihin korpuksiin.Tyylimetriikan menetelmien avulla pyritään kuvaamaan pelkistetty malli, jossa näkyvät korpusten vastaavuus ja erot. Näin voidaan lähestyä kvantitatiivisesti kansanrunouden elementtien valikoitumisen ja välittymisen ongelmaapastisseissa. Analyysi osoittaa, että ”venäläisten laulujen” poetiikka ei muistuta imitoimiaan kansanlauluja, vaan on lähempänä aikansa yleisiä kaunokirjallisia normeja

Journal.fi

Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

Author: Byszuk Joanna
Eder Maciej
Idziak Jan
Leśniak Albert
Woźniak Michał
Šeļa Artjoms
Publication venue
Publication date: 28/03/2023
Field of study

The paper discusses an approach to decipher large collections of handwritten index cards of historical dictionaries. Our study provides a working solution that reads the cards, and links their lemmas to a searchable list of dictionary entries, for a large historical dictionary entitled the Dictionary of the 17th- and 18th-century Polish, which comprizes 2.8 million index cards. We apply a tailored handwritten text recognition (HTR) solution that involves (1) an optimized detection model; (2) a recognition model to decipher the handwritten content, designed as a spatial transformer network (STN) followed by convolutional neural network (RCNN) with a connectionist temporal classification layer (CTC), trained using a synthetic set of 500,000 generated Polish words of different length; (3) a post-processing step using constrained Word Beam Search (WBC): the predictions were matched against a list of dictionary entries known in advance. Our model achieved the accuracy of 0.881 on the word level, which outperforms the base RCNN model. Within this study we produced a set of 20,000 manually annotated index cards that can be used for future benchmarks and transfer learning HTR applications

arXiv.org e-Print Archive

Gyenge műfajok: a költői versmérték és a jelentés közötti kapcsolat modellálása az orosz költészetben

Author: Leibov Roman
Orekhov Boris
Šeļa Artjoms
Publication venue: 'Eotvos Lorand University (ELTE)'
Publication date: 01/01/2021
Field of study

A dolgozat egy már meglévő, „a versmérték jelentésmezőjeként” ismert költészetelmélet formalizálását kísérli meg, amely elmélet azt állítja, hogy a modern líra különböző metrikai formái bizonyos jelentésbeli asszociációkat halmoznak fel és őriznek meg. Az LDA témamodellező (topic modelling) algoritmussal vizsgáltuk az orosz költészet tág korpuszát (1750–1950), hogy ezáltal minden egyes verset egy tématérben, a versmértékeket pedig a témák valószínűségének eloszlása szerint reprezentáljunk. Nem felügyelt osztályozást és kiterjedt mintavételt alkalmazva megmutatjuk, hogy a verselési formákon belül és között erős a forma és a jelentés kapcsolata: ugyanahhoz a versmértékhez tartozó két minta sokszor nagyon is hasonlóként tűnik fel, és ugyanannak a családnak két verselési formája legtöbbször szintén egy klaszterbe kerül. Ez a kapcsolat akkor is kimutatható, ha a korpusz kronológiai szempontból ellenőrzött, és nem következménye a populáció méretének. Amellett érvelünk, hogy hasonló megközelítést nyelvek és költészeti hagyományok szemantikai mezőinek összehasonlításakor is alkalmazni lehet, amelynek révén az irodalomtörténet legalapvetőbb kérdéseire adhatók releváns válaszok

ELTE Digital Institutional Repository (EDIT)

Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse

Author: Lassche Alie
Plecháč Petr
Šeļa Artjoms
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/09/2021
Field of study

Recent advances in cultural analytics and large-scale computational studies of art, literature and film often show that long-term change in the features of artistic works happens gradually. These findings suggest that conservative forces that shape creative domains might be underestimated. To this end, we provide the first large-scale formal evidence of the persistent association between poetic meter and semantics in 18-19th European literatures, using Czech, German and Russian collections with additional data from English poetry and early modern Dutch songs. Our study traces this association through a series of clustering experiments using the abstracted semantic features of 150,000 poems. With the aid of topic modeling we infer semantic features for individual poems. Texts were also lexically simplified across collections to increase generalizability and decrease the sparseness of word frequency distributions. Topics alone enable recognition of the meters in each observed language, as may be seen from highly robust clustering of same-meter samples (median Adjusted Rand Index between 0.48 and 1). In addition, this study shows that the strength of the association between form and meaning tends to decrease over time. This may reflect a shift in aesthetic conventions between the 18th and 19th centuries as individual innovation was increasingly favored in literature. Despite this decline, it remains possible to recognize semantics of the meters from past or future, which suggests the continuity of semantic traditions while also revealing the historical variability of conditions across languages. This paper argues that distinct metrical forms, which are often copied in a language over centuries, also maintain long-term semantic inertia in poetry. Our findings, thus, highlight the role of the formal features of cultural items in influencing the pace and shape of cultural evolution

arXiv.org e-Print Archive

PubMed Central

Leiden University Scholary Publications

Deep transitions: towards a comprehensive framework for mapping major continuities and ruptures in industrial modernity

Author: Kanger Laur
Orru Kati
Pahker Anna-Kati
Sillak Silver
Tinits Peeter
Tiwari Amaresh Kumar
Vaik Kristiina
Šeļa Artjoms
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

The world is confronted by a socio-ecological emergency, requiring rapid and deep decarbonization of a broad range of socio-technical systems. A recent Deep Transitions framework argues that this fundamentally unsustainable trajectory has been generated by the co-evolutionary dynamics of multiple systems during the last 250 years. Altering this direction requires transformation in industrial modernity – a set of most fundamental ideas, institutions, and practices characterizing every industrial society to date. Although the proponents of the framework suggest that this shift has been unfolding since the 1960s, no attempts have been made to operationalize the concept of industrial modernity and to assess this claim. This paper develops a comprehensive multi-dimensional and multi-domain approach for the measurement of industrial modernity. As such it seeks to provide empirical evidence of long-term continuities and emerging ruptures in the dominant ideas, institutions, and practices of industrial societies along the domains of environment and technology. Using a methodologically novel approach where the text mining of newspapers is combined with data from various databases the paper provides results from three countries – Australia, Germany, Soviet Union/Russia – between 1900 and 2020. Despite considerable country-level differences the results show shifts in public environmental discourse from the 1960s, followed by institutional changes from the 1980s but with only a modest change in practices. We also observe some change in the direction of innovative activities and their regulation coupled with a resurgent optimism in technology-environment discourse. The findings tentatively suggest that industrial modernity might be in the process of hollowing out along ideational and institutional dimensions in the environmental domain but less so in the domain of technology and innovation

VBN

Sussex Research Online

CLS Infra Computational Literary Studies Infrastructure

Computational Literary Studies Infrastructure, funded by the Horizon2020 grant scheme, is a four-year, pan-European project that aims to unify the diverse landscape of computational text analysis, in terms of available texts, tools, methods, practices and so forth, within its growing international user community. The project started out in February 2021, meaning that it has been underway for just over a year. In our poster we discuss the various deliverables and activities that have come out of the CLS INFRA project in its first quarter to give an idea of its impact in practice

Biblio at Institute of Formal and Applied Linguistics